DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

نویسندگان

  • Bonnie J. Dorr
  • Lisa Pearl
  • Rebecca Hwa
  • Nizar Habash
چکیده

The frequent occurrence of divergences|structural diier-ences between languages|presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Divergence Unraveling for Word Alignment of Parallel Corpora

We describe the use of parallel text for divergence unraveling in word-level alignment. DUSTer (Divergence Unraveling for Statistical Translation) is a system that combines linguistic and statistical knowledge to resolve structural differences between languages, i.e., translation divergences, during the process of alignment. Our immediate goal is to induce word-level alignments that are more ac...

متن کامل

Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT

Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...

متن کامل

Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Under consideration for other conferences (specify)? none Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the ...

متن کامل

Improving Bitext Word Alignments via Syntax-based Reordering of English

We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce s...

متن کامل

Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

The continuously growing MT market faces the challenge of translating new languages, diverse genres, and different domains using a variety of available linguistic resources. As such, MT system adaptability has become a sought-after necessity. An adaptable statistical or Hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002